The new AlphaFold model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues is described, showing that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.
It is found that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks, and performs competitively with the state-of-the-art on image, video, and speech recognition tasks.
This work improves existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales and presents a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens.
This work introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks and provides a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks.
Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters, delivers the best performance for their size, and even offers competitive alternatives to models that are 2-3 times bigger.
This System Card provides a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures the authors've implemented to ensure the model is safe and aligned.
The introduction is organized in a unique didactic manner developed by the authors, starting from more simple concepts such as linear programming and single-point methods, and advancing from these to more difficult concepts such as optimality conditions for nonlinear optimization and set-oriented solution algorithms.
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models, and presents comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development.
OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations, is introduced and it is shown that it can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities.
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2, a large model that significantly outperforms other models of comparable size and makes the model weights available under an OpenRAIL license.
OLMo is built, a competitive, truly Open Language Model, to enable the scientific study of language models and it is hoped this release will empower the open research community and inspire a new wave of innovation.
The experimental results show that the ANDV A F -test feature selection algorithm along with the Support Vector Machine classifier, is a viable approach for developing an advanced intelligent system that can identify heart disease.
Recent improvements to Job Dispatcher are overviews, including its brand new website and documentation, enhanced visualisations, improved job management, and a rising trend of user reliance on the service from low- and middle-income regions.
The development of TRIPOD+AI is described and the expanded 27 item checklist with more detailed explanation of each reporting recommendation is presented, and the TRIPOD+AI for Abstracts checklist is presented.
To facilitate scientific research on language model pretraining, Dolma is curate and released, a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials.
GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas, document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society.
The RewardBench dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety, to benchmark how reward models perform on challenging, structured and out-of-distribution queries and presents many findings on propensity for refusals, reasoning limitations, and instruction following shortcomings of various reward models towards a better understanding of the RLHF process.
Two below-threshold surface code memories on Willow, a distance-7 code and a distance-5 code integrated with a real-time decoder, indicate device performance that, if scaled, could realize the operational requirements of large-scale fault-tolerant quantum algorithms.
A new measure of firm-level AI investments is proposed, using a unique combination of worker resume and job postings datasets, which reveals a stark increase in AI investments across sectors.
This work facilitates low precision KV cache quantization by incorporating several novel methods, including per-Channel Key Quantization, and develops custom CUDA kernels for KVQuant, which enables serving LLaMA-7B with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system.
PaliGemma is an open Vision-Language Model that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model that achieves strong performance on a wide variety of open-world tasks.
Molmo is presented, a new family of VLMs that are state-of-the-art in their class of openness, with a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions.
The Cosmos World Foundation Model Platform is presented to help developers build customized world models for their Physical AI setups and position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
A novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks.
This paper tries to answer the question: How does ICT affect the leadership in virtual teams?
Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface, and general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support.
This work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced, and introduces extensive new evaluation suites that broaden the state of the art for multilingual eval across 99 languages.
DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens, which substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.
An extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%, which underscores the need for further advancements in this area.
It is argued that social scientists can address many of these limitations of Generative AI by creating open-source infrastructure for research on human behavior, not only to ensure broad access to high-quality research tools, but also because the progress of AI will require deeper understanding of the social forces that guide human behavior.
A simple approach to joint named entity recognition and relation extraction is presented and how pretrained large language models can be fine-tuned to extract useful records of complex scientific knowledge is demonstrated.
Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations, is introduced and will stimulate the community to help multimodal LLMs catch up with human-level visual perception.
Virchow is presented, the largest foundation model for computational pathology to date, and it is demonstrated that a large foundation model enables pan-cancer detection and can achieve similar performance to tissue-specific clinical-grade models in production and outperform them on some rare variants of cancer.
It is demonstrated that large language models exhibit behaviour that is consistent with the outputs of mentalistic inference in humans but also highlights the importance of systematic testing to ensure a non-superficial comparison between human and artificial intelligences.
The primary goal is to bridge the language gap by building a human-curated instruction-following dataset spanning 65 languages, and the most extensive multilingual collection to date, comprising 513 million instances through templating and translating existing datasets across 114 languages.
The paper delves into specific GNN models like graph convolution networks (GCNs), GraphSAGE, and graph attention networks (GATs), which are widely used in various applications today and offers an extensive overview of the landscape of GNN research and its practical implementations.
This work reviews the latest efforts for achieving hardware-based memristive artificial neural networks (ANNs), describing with detail the working principia of each block and the different design alternatives with their own advantages and disadvantages, as well as the tools required for accurate estimation of performance metrics.
The latest version of SwissDock is presented, in which EADock DSS has been replaced by two state-of-the-art docking programs, i.e. Attracting Cavities and AutoDock Vina, and a user-friendly command-line access is developed which enables covalent ligand docking with Attracting Cavities.
This work introduces Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD and utilizes generative features from pretrained latent diffusion models, enabling high-resolution multi-aspect ratio image synthesis.
Article Galaxy Pages is a free service from Research Solutions, a company that offers access to content in collaboration with publishing partners, online repositories and discovery services.